A Word-level Morphosyntactic Analyzer for Basque

نویسندگان

  • Itziar Aduriz
  • Eneko Agirre
  • Izaskun Aldezabal
  • Xabier Arregi
  • Jose Maria Arriola
  • Xabier Artola
  • Koldo Gojenola
  • A. Maritxalar
  • Kepa Sarasola
  • Miriam Urkia
چکیده

This work presents the development and implementation of a full morphological analyzer for Basque, an agglutinative language. Several problems (phrase structure inside word-forms, noun ellipsis, multiplicity of values for the same feature and the use of complex linguistic representations) have forced us to go beyond the morphological segmentation of words, and to include an extra module that performs a full morphosyntactic parsing of each word-form. A unification-based word-level grammar has been defined for that purpose. The system has been integrated into a general environment for the automatic processing of corpora, using TEI-conformant SGML feature structures.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Machine Learning of Morphosyntactic Structure: Lemmatizing Unknown Slovene Words

Automatic lemmatization is a core application for many language processing tasks. In inflectionally rich languages, such as Slovene, assigning the correct lemma (base form) to each word in a running text is not trivial, since for instance, nouns inflect for number and case, with a complex configuration of endings and stem modifications. The problem is especially difficult for unknown words, sin...

متن کامل

Exploring Treebank Transformations in Dependency Parsing

This paper presents a set of experiments performed on parsing the Basque Dependency Treebank. We have concentrated on treebank transformations, maintaining the same basic parsing algorithm across the experiments. The experiments can be classified in two groups: 1) feature optimization, which is important mainly due to the fact that Basque is an agglutinative language, with a rich set of morphos...

متن کامل

A word-grammar based morphological analyzer for agglutinative languages

Agglutinative languages presenl rich morphology and for sonic applications they lleed deep analysis at word level. Tile work here presenled proposes a model for designing a full nlorphological analyzer. The model integrates lhe two-level fornlalisnl alld a ullificalion-I)asod fornialisni. In contrast to other works, we propose to separate the treatment of sequential and non-sequetTtial mou)hola...

متن کامل

An Event Related Field Study of Rapid Grammatical Plasticity in Adult Second-Language Learners

The present study used magnetoencephalography (MEG) to investigate how Spanish adult learners of Basque respond to morphosyntactic violations after a short period of training on a small fragment of Basque grammar. Participants (n = 17) were exposed to violation and control phrases in three phases (pretest, training, generalization-test). In each phase participants listened to short Basque phras...

متن کامل

Different Issues in the Design of a Lemmatizer/Tagger for Basque

This paper presents relevant issues that have been considered in the design of a general purpose lemmatizer/tagger for Basque (EUSLEM). The lemmatizer/tagger is conceived as a basic tool necessary for other linguistic applications. It uses the lexical data base and the morphological analyzer previously developed and implemented. Due to the characteristics of the language, the tagset here propos...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000